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I^n. (57) Abstract: The present invention relates to a scoring system for the prediction of cancer recurrence. More particularly, the 
present invention concerns with the selection of genes and/or proteins, and generation of formulae with the selected genes and/or 
t*) proteins for the prediction of cancer recurrence by measuring the expression of genes and/or proteins of human tumor tissues, and 
® comparing their patterns with those of the gene and/or protein expression of human primary tumors from patients who have cancer 
S recurrence and those who do not have cancer recurrence. The present invention also relates to a kit for performing the method 
* — of the present invention comprising DNA chip, oligonucleotide chip, protein chip, peptides, antibodies, probes and primers that 
a™ necessary for effecting DNA microarrays, oligonucleotide microarrays, protein arrays, northern blotting, in situ hybridization, 
RNase protection assays, western blotting, ELIS A assays, reverse transcription polymerase-chain reaction (hereinafter referred to as 
RT-PCR) to examine the expression of at least 2 or more of genes and/or proteins, preferably 4 or more of genes and/or proteins, 
more preferably 6 or more of genes and/or proteins, and most preferably 12 or more of genes and/or proteins, that are indicative of 
cancer recurrence. 
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Description 

Scoring System for the Prediction of Cancer Recurrence 

5 

The present invention relates to a scoring system for the prediction of cancer recurrence. More 
particularly, the present invention concerns with the selection of genes and/or proteins, and 
generation of formulae with the selected genes and/or proteins for the prediction of cancer recurrence 
by measuring the expression of genes and/or proteins of human tumor tissues, and comparing their 
10 patterns with those of the gene and/or protein expression of human primary tumors from patients 
who have cancer recurrence and those who do not have cancer recurrence. 

■ 

The present invention also relates to a kit for performing the method of the present invention 
comprising DNA chip, oligonucleotide chip, protein chip, peptides, antibodies, probes and primers 

15 that are necessary for effecting DNA microarrays, oligonucleotide microarrays, protein arrays, 
northern blotting, in situ hybridization, RNase protection assays, western blotting, ELISA assays, 
reverse transcription polymerase-chain reaction (hereinafter referred to as RT-PCR) to examine the 
expression of at least 2 or more of genes and/or proteins, preferably 4 or more of genes and/or 
proteins, more preferably 6 or more of genes and/or proteins, and most preferably 12 or more of 

20 genes and/or proteins, that are indicative of cancer recurrence. 

Background of the invention 

Cancer is one of the major causatives of death in the world. The overall prevalence 
25 rate of cancer is about 1 % of the population and yearly incidence rate is about 0.5 %. About one 
out of ten patients discharged from hospitals have cancer as their primary diagnosis. The main 
existing treatment modalities are surgical resection, radiotherapy, chemotherapy, and biological 
therapy including hormonal therapy. Furthermore, newly developed biotechnologies have been 
offering new treatment modalities, such as gene therapy. Nevertheless, cancer is dreaded disease 
30 because in most cases there is no really effective treatment available. One of the major difficulties 
of cancer treatment is the ability of cancer cells to become resistant to drugs and to spread to other 
sites of tissues, where they can generate new tumors, which often results in recurrence. If a cancer 
recurrence is predictable before recurrence occurs, such cancer becomes curable by local treatment 
with surgery. 



35 
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Among various tumors, hepatocellular carcinoma (hereinafter referred to as HCC) is 
one of the most common fatal cancers in' the world and the number of incidences is increasing in 
many countries including the USA, Japan, China and European countries. Both hepatitis B virus 
(hereinafter referred to as HBV) and hepatitis C virus (hereinafter referred to as HCV) infections 

5 can be a causative of HCC. In fact, increase in HCC patients is in parallel to an increase in chronic 
HCV infection (El-Serag, H.B. & Mason, AC. Rising incidence of hepatocellular carcinoma in the 
United States, N. Engl. J. Med. 340, 745-750 (1999) and Okuda, K. Hepatocellular carcinoma, J. 
Hepatol 32, 225-237 (2000)). Despite the elevated incidences of HCC, there is no promising 
therapy for this disease. The major problem in the treatment of HCC is intrahepatic metastasis. 

10 Recurrence was observed in 30 to 50% of HCC patients who had received hepatic resection (Iizulca, 

N. et al NM23-H1 and NM23-H2 messenger RNA abundance in human hepatocellular carcinoma, 

Cancer Res. 55, 652-657 (1995), Yamamoto, J. et al Recurrence of hepatocellular carcinoma after 

surgery, Br. J. Surg. 83, 1219-1222 (1996), and Poon, R.T. et al. Different risk factors and 
*** ^ 

prognosis for early and late intrahepatic recurrence after resection of hepatocellular carcinoma, 
15 Cancer 89, 500-507 (2000)). Although the pathologic TNM staging system has been applied in the 
treatment of HCC, this system is poorly predictive of recurrences in patients who undergo hepatic 
resection (Izumi, R. et al. Prognostic factors of hepatocellular carcinoma in patient undergoing 
hepatic resection, Gastroenterology 106, 720-727 (1994)). A number of molecules have also been 
proposed as predictive markers for HCCs, none of them has proven to be clinically useful (Iizuka, N. 
20 et al NM23-H1 and NM23-H2 messenger RNA abundance in human hepatocellular carcinoma, 
Cancer Res. 55, 652-657 (1995), Hsu, H.C. et al. Expression of p53 gene in 184 unifocal 
hepatocellular carcinomas: association with tumor growth and invasiveness, Cancer Res. 53, 4691- 
4694 (1993), and Mathew, J. et al. CD44 is expressed in hepatocellular carcinomas showing 
vascular invasion, J. Pathol. 179, 74-79 (1996)). Thus, any method to predict recurrence would be 
25 quite valuable to understand cancer mechanisms and also to establish the new therapies for cancer. 
However, because there are technological limitations for predicting recurrence by the traditional 
methods and further limitations may be attributable to high inter-patient heterogeneity of tumors, it 
is necessary to devise a novel method to characterize tumors and predict cancer recurrence. 

30 

Recent development of microarray technologies, which allow one to perform parallel 
expression analysis of a large number of genes, has opened up a new era in medical science (Schena, 
M. et al Quantitative monitoring of gene expression patterns with a complementary DNA 
microarray, Science 270, 467-470 (1995), and DeRisi, J. et al. Use of a cDNA microarray to 
35 analyze gene expression patterns in human cancer, Nature Genet. 14, 457-460 (1996)). In 
particular, studies by cDNA microarrays of the gene expression of tumors have provided significant 
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insights into the properties of malignant tumors such as prognosis and drug-sensitivity (Alizadeh, 
A.A. et al Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, 
Nature 403, 503-511 (2000), and Scherf, U. et al A gene expression database for the molecular 
pharmacology of cancer, Nature Genet 24, 236-244 (2000)). 

5 Recently, supervised learning has been introduced into gene-expression analysis (Brazma, A & Vilo, 
J. Gene expression data analysis, FEBSLett. 480, 17-24 (2000) and Kell, D.B, & King, R.D. On 
the optimization of classes for the assignment of unidentified reading frames in functional genomics 
programs: the need for machine learning, Trends Biotechnol. 18, 93-98 (2000)). Using classified 
samples, supervised learning has the conclusive advantage of much a priori knowledge about the 

10 nature of the data (Duda, R.O. et al Pattern classification, John Wiley & Sons (2001), and 
Jain, A.K. et al Statistical pattern recognition: A review, IEEE Trans. Pattern Analysis and 
Machine Intelligence. 22, 4-37 (2000)). However, none of supervised learning methods previously 
published directly evaluates the combination of genes and thus can utilize information concerning the 
statistical characteristics, i.e., structure of the distribution of genes (Golub, T.R. et al Molecular 

15 classification of cancer: class discovery and class prediction by gene expression monitoring, Science 
286, 531-537 (1999), and Brown, M.P. et al Knowledge-based analysis of microarray gene 
expression data by using support vector machines, Proc. Natl Acad. Scu US A 97, 262-267 
(2000)). 



20 Scoring systems that are predictive of cancer recurrence are created by analyzing the DNA. 

microarray data with supervised learning in statistical pattern recognition (Duda, R.O. et al Pattern 
classification, John Wiley & Sons (2001)). 

Supervised learning in statistical pattern recognition has been successfully applied to resolve a 

4 

25 variety of issues such as document classification, speech recognition, biometric recognition, and 
remote sensing (Jain, A.K. et aL Statistical pattern recognition: A review, IEEE Trans, Pattern 
Analysis and Machine Intelligence. 22, 4-37 (2000)). 

In the present invention, the inventors provide a scoring system to predict cancer 
30 recurrence by analyzing the expression of genes and/or proteins of human primary tumors. That is 
the invention concerns a method for the prediction of cancer recurrence which comprises measuring 
the expression of genes and/or proteins of human tumor tissues, and comparing it with the 
expression of the genes and/or proteins of human primary tumors from patients who have cancer 
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recurrence and those who do not have cancer recurrence. 

Brief Description of the Drawings 

5 Figure 1 illustrates the procedure of gene selection (Steps 1-7) and evaluation (Steps 8-10) of the scoring 
system with the optimal gene subset. 

. Figure 2 illustrates the optimal number of genes. 

Figure 3 illustrates the average differences of the mRNA for the genes selected for the prediction of early 
intrahepatic recurrence. The average differences of the mRNA for the 12 genes were compared between 
10 Group A (indicated as A) and Group B (indicated as B). 

» 

Figure 4 illustrates the relation between virus type, TNM stage, and scores (T values) for the 
prediction of early intrahepatic recurrence. Using the optimal subset of 12 genes, the scoring system 
created with 30 training samples was evaluated with 3 test samples. This operation was 
independently repeated 10 times. The T values for all of the test sample were calculated. Early 
15 intrahepatic recurrence was predicted when the T value is below zero. Regardless of stage and virus 
types, all HCCs with a negative T value had early intrahepatic recurrences and all HCCs with a 
positive T value had no recurrences. Filled, Group A (patients with early intrahepatic recurrence); 
White, Group B (patients without early intrahepatic recurrence); O, stage I;0, stage II; A, stage 
IIIA; □, stage IVA B; HBV-positive, C; HCV-positive, N; HBV- HCV-double negative. 

20 Figure 5 illustrates the scoring system. 

Detailed explanation of the invention 

In the present invention, human tissues from tumors including those of brain, lung, 
breast, stomach, liver, pancreas, gallbladder, colon, rectum, kidney, bladder, ovary, uterus, prostate, 
25 and skin are used. After human tissues are resected during surgeries, it is preferable that they are 
immediately frozen in liquid nitrogen or acetone containing dry ice and stored at between -70 and - 
80°C until use with or without being embedded in O.C.T. compound (Sakura-Seiki, Tokyo, Japan, 
Catalog No. 4583). 

30 

Expression of genes and/or proteins of tumor tissues from patients who are tested for the probability 
of cancer recurrence are analyzed by measuring the levels of RNA and/or proteins. In many cases, 
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the levels of RNA and/or proteins are determined by measuring fluorescence from substances 
including fluorescein and rhodamine, chemiluminescence from luminole, radioactivities of 
radioactive materials including 3 H, 14 C, 35 S, 33 P, 32 P, and 125 I, and optical densities. Expression 
levels of RNA and/or proteins are determined by known methods including DNA microarray 
5 (Schena, M. et al Quantitative monitoring of gene expression patterns with a complementary DNA 
microarray, Science 270, 467-470 (1995), and Lipshutz, R J. et al High density synthetic 
oligonucleotide arrays, Nature Genet 21, 20-24 (1999)), RT-PCR (Weis, J.H. et al Detection of 
rare mRNAs via quantitative RT-PCR, Trends Genetics 8, 263-264 (1992), and Bustin, S.A 
Absolute quantification of mRNA using real-time reverse transcription polymerase chain reaction 

10 assays, J. Mol Endocrinol 25, 169-193 (2000)), northern blotting and in situ hybridization (Parker, 
R.M. & Barnes, N.M. mRNA: detection in situ and northern hybridization, Methods Mol Biol 106, 
247-283 (1999)), RNase protection assay (Hod, Y. A Simplified ribonuclease protection assay, 
Biotechniques 13, 852-854 (1992), Saccomanno, C.F. et al A faster ribonuclease protection assay, 
Biotechniques 13, 846-850 (1992)), western blotting (Towbin, H. et al Electrophoretic transfer of 

15 proteins from polyacrylamide gels to nitrocellulose sheets, Proc. Natl Acad, ScLUSA 76, 4350- 
4354 (1979), Burnette, W.N. Western blotting: Electrophoretic transfer of proteins form 
sodium dodecyl sulfate-polyacrylamide gels to unmodified nitrocellulose and radioiodinated 
protein A, Anal Biochem. 112, 195-203 (1981)), ELISA assays (Engvall, E. & Perlman, P. 
Enzyme-linked immunosorbent assay (ELISA): Quantitative assay of immunoglobulin G, 

20 Immunochemistry 8: 871-879 (1971)), and protein arrays (Merchant, M. & Weinberger, S.R. 
Review: Recent advancements in surface-enhanced laser desorption/ionization-time of flight-mass 
spectrometry, Electrophoresis 21, 1164-1177 (2000), Paweletz, CP. etal Rapid protein display 
profiling of cancer progression directly from human tissue using a protein biochip, Drug 
Development Research 49, 34-42 (2000)). 



Expression of genes and/or proteins of tumors from cancer patients who have early 
recurrence and those who do not are determined in the same way as that for the patients who are 
tested for the probability of recurrence. 

30 

Although early recurrence of cancer varies among different cancer types, it usually 
occurs within one or two years after resection. Therefore, tumors from cancer patients who have 
recurrence within one or two years after resection can be used as the tumors of patients with early 
35 recurrence, and those from patients who do not have recurrence before one or two years after 
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resection can be used as the tumors of patients without early recurrence. 

* 

Differences in the expression levels or patterns of genes and/or proteins of tumors 
between cancer patients who have early recurrence and who do not can be analyzed and detected by 
5 known methods of statistical analyses. Supervised learning in statistical pattern recognition can be 
used for statistical analysis of the expression patterns of genes and/or proteins of tumors. By 
supervised learning in statistical pattern recognition, 2 or more of genes and/or proteins of which 
expression is indicative of cancer recurrence are selected from the examined genes and/or proteins. 

10 Some genes and/or proteins that are indicative of cancer recurrence are first selected by 

one-dimenstional criteria. Then, the optimal subsets of genes and/or proteins are selected out of 
these genes and/or proteins by an exhaustive search with the leave-one-out method that can take all 
the possible combinations of genes and/or proteins into account. 

15 Formulae that are predictive of cancer recurrence are created by using the optimal 

subsets of at least 2 or more of genes and/or proteins, preferably 4 or more of genes and/or proteins, 
more preferably 6 or more of genes and/or proteins, and most preferably 12 or more of genes and/or 
proteins of which expression is indicative of cancer recurrence. Simple classifiers such as linear 
classifier (Duda, R.O. et al. Pattern classification, John Wiley & Sons (2001), and Jain, A.K. et al 

20 Statistical pattern recognition: A review, IEEE Trans. Pattern Analysis and Machine Intelligence. 
22, 4-37 (2000)) that work well even if the number of samples is small compared to the number of 
genes and/or proteins are used to create formulae. 

The present invention also concerns kits to carry out the methods of the present 
25 invention. Kits to examine the expression patterns of 2 or more of genes and/or proteins that are 
indicative of cancer recurrence consist of the components including reagents for RNA extraction, 
enzymes for the syntheses of cDNA and cRNA, DNA chip, oligonucleotide chip, protein chip, 
probes and primers for the analyses, DNA fragments of control genes, and antibodies to various 
proteins. Components of the kits are easily available from the market. For instance, oligonucleotide 
30 chips, guanidine-phenol reagent, reverse transcriptase, T7 RNA polymerase and taq polymerase can 
be purchased and assembled for the kits of the present invention. 



The following examples merely illustrate the preferred method for the prediction of 
35 cancer recurrent of the present invention and are not to be construed as being limited thereto. 
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Examples 

Example 1. Selection of the patients for analysis of early intrahepatic recurrence 

It has been reported that early intrahepatic recurrences (within one year) after surgery arise mainly 
from intrahepatic metastases, whereas late recurrences are more likely to be multicentric occurrence 
5 (Poon, R.T. et al Different risk factors and prognosis for early and late intrahepatic recurrence after 
resection of hepatocellular carcinoma, Cancer 89, 500-507 (2000)). Moreover, it is well known that 
the outcome of patients with intrahepatic recurrence was worse than that of patients with 
multicentric occurrence (Yamamoto, J. et al Recurrence of hepatocellular carcinoma after surgery, 
Br, J. Surg. 83, 1219-1222 (1996), and Poon, R.T. et al Different risk factors and prognosis for 
10 early and late intrahepatic recurrence after resection of hepatocellular carcinoma, Cancer 89, 500- 
507 (2000)). Therefore gene-expression patterns linked to early intrahepatic recurrence were 
investigated within one year after surgery. 



Thirty-three patients underwent surgical treatment for HCC in Yamaguchi University Hospital 

15 between May 1997 and January 2000. Informed consent in writing was obtained from all cases 
before surgery. The study protocol was approved by the Institutional Review Board for Human Use 
at the Yamaguchi University School of Medicine in May 1996. A histopathologic^ diagnosis of 
HCC was made in all patients after surgery. The histopathological examination also revealed no 
residual tumors (R0) in all of the 33 HCC samples. Table 1 shows the clinicopathologic 

20 characteristics of the 33 patients, based on the TNM classification of Union Internationale Contre le 
Cancer (UICC) (Sobin, L.H. & Wittekind, C. TNM classification of Malignant Tumors, 5th ed., 
UICC, Wiley-Liss, 74-77 (1997)). Serologically, 7 patients were hepatitis B surface antigen- 
positive, 22 patients were anti-HCV antibody-positive, and the remaining 4 patients were negative 
for both. The 33 patients were tracked for cancer recurrence with ultrasonography, computed 

25 tomography, and alpha-fetoprotein level every 3 months following hepatic resection. Whenever 
necessary, magnetic resonance imaging and hepatic angiography were added. Of the 33 HCC 
patients, early intrahepatic recurrences were found in 12 (36%). In 11 of the 12 patients, recurrent 
HCCs were detected as multiple nodules or diffuse dissemination in the remnant liver. In one patient, 
a novel tumour was detected as single nodule in the segment adjacent to the resected primary lesion 

30 9 month after surgery, and then multiple lung metastases were observed None of the remaining 21 
patients had intrahepatic recurrences and other distant metastases within one year after surgery. 
These patients were divided into two groups; the patients who had intrahepatic recurrences within 
one year in Group A (n=12) and those who did not in Group B (n=21) (Table 1). The yC test and 
Fisher's exact test were used to elucidate differences in clinicopathologic factors between the 2 

35 groups. 
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Example 2. Extraction of the RNA from tissues 

Pieces of the tissues (about 125mm 3 ) were suspended in TRIZOL (Life Technologies, 
Gaithersburg, USA, Catalog No. 15596-018) or Sepasol-RNAI (Nacalai tesque, Kyoto, Japan, 

5 Catalog No. 306-55) and homogenized twice with a Polytron(Kinematica, Littau, Switzerland) (5 
sec. at maximum speed). After addition of chloroform, the tissues homogenates were centrifuged at 
15,000 x g for 10 min, and aqueous phases, which contained RNA, were collected. Total cellular 
RNA was precipitated with isopropyl alcohol, washed once with 70% ethanol and suspended in 
DEPC-treated water (Life Technologies, Gaithersburg, USA, Catalog No. 10813-012). After RNA 

10 was treated with 1.5 units of DNase I (Life Technologies, Gaithersburg, USA, Catalog No. 18068- 
015), the RNA was re-extracted with TRIZOL/chloroform, precipitated with ethanol and dissolved 
in DEPC-treated water. Thereafter, small molecular weight nucleotides were removed by using 
RNeasy Mini Kit (QIAGEN, Hilden, Germany, Catalog No. 74104) according to a manufacture's 
instruction manual. Quality of the total RNA was judged from ratio of 28S and 18S ribosomal 

15 RNA after agarose gel electrophoresis. The purified total RNA was stored at -80 °C in 70% ethanol 
solution until use. 



Example 3. Synthesis of cDNA and labeled cRNA probes 

20 cDNA was synthesized by using reverse Superscript Choice System (Life Technologies, 

Gaithersburg, USA, Catalog No. 18090-019) according to the manufacture's instruction manual. 
Five microgram of the purified total RNA was hybridized with an oligo-dT primer (Sawady 
Technology, Tokyo, Japan) that contained the sequences for the T7 promoter and 200 units of 
SuperScriptn reverse transcriptase and incubated at 42 °C for 1 hr. The resulting cDNA was 

25 extracted with phenol/chloroform and purified with Phase Lock Gel Light (Eppendorf, Hamburg, 
Germany, Catalog No. 0032 005.101). 

cRNA was also synthesized by using MEGAscript T7 kit (Ambion, Austin, USA, 
Catalog No. 1334) and the cDNA as templates according to the manufacture's instruction. 
30 Approximately 5 \xg of the cDNA was incubated with 2 (xl of enzyme mix containing T7 polymerase, 
7.5 mM each of adenosine triphosphate (ATP) and guanosine triphosphate (GTP), 5.625 mM each 
of cytidine triphosphate (CTP) and uridine triphosphate (UTP), 1.875 mM each of Bio-ll-CTP and 
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Bio-16-UTP (ENZO Diagnostics, Farmingdale, USA, Catalog No. 42818 and 42814, respectively) 
at 37 °C for 6 hr. Mononucleotides and short oligonucleotides were removed by column 
chromatography on CHROMA SPIN +STE-100 column (Ciontech, Palo Alto, USA, Catalog No. 
K1302-2), and the cRNA in the eluates was sedimented by adding ethanol. Quality of the cRNA 
5 was judged from the length of the cRNA after agarose gel electrophoresis. The purified cRNA was 
stored at -80 °C in 70% ethanol solution until use. 



Example 4. Gene expression analysis of tumors from patients with and without recurrence 

10 Gene expression of human primary tumors from live cancer patients were examined by 

high-density oligonucleotide microarrays (HuGeneFL array, Affymetrix, Santa Clara, USA, Catalog 
No. 510137) (Lipshutz, R.L. et al. High density synthetic oligonucleotide arrays, Nature Genet. 21, 
20-24 (1999)). For hybridization with oligonucleotides on the chips, the cRNA was fragmented at 
95 °C for 35 min in a buffer containing 40 mM Tris (Sigma, St. Louis, USA, Catalog No. T1503)- 

15 acetic acid (Wako, Osaka, Japan, Catalog No. 017-00256) (pH8.1), 100 mM potassium acetate 
(Wako, Osaka, Japan, Catalog No. 160-03175), and 30mM magnesium acetate (Wako, Osaka, 
Japan, Catalog No. 130-00095). Hybridization was performed in 200^il of a buffer containing 0.1M 
2-(N-Morpholino) ethanesulfonic acid (MES) (Sigma, St. Louis, USA, Catalog No. M-3885) 
(pH6.7), 1M NaCl (Nacalai tescque, Tokyo, Japan, Catalog No. 313-20), 0.01% polyoxylene(10) 

20 octyiphenyl ether (Wako, Osaka, Japan, Catalog No. 168-11805), 20 \xg herring spam DNA 
(Promega, Madison, USA, Catalog No. D181B), 100fxg acetylated bovine serum albumin (Sigma, 
St. Louis, USA, Catalog No. B-8894), 10 \xg of the fragmented cRNA, and biotinylated-control 
oligonucleotides, biotin-5 '-CTGAACGGTAGCATCTTGAC-3 ' (Sawady technology, Tokyo, 
Japan) at 45 °C for 12 hr. After washing the chips with a buffer containing 0.01M MES (pH6.7), 

25 0.1M NaCl, 0.001% polyoxylene(10) octyiphenyl ether buffer, the chips were incubated with 
biotinylated anti-streptavidin antibody (Funakoshi, Tokyo, Japan, Catalog No. BA0500) and 
staining with streptavidin R-Phycoerythrin (Molecular Probes, Eugene, USA, Catalog No. S-866) to 
increase hybridization signals as described in the instruction manual (Affymetrix, Santa Clara, 
USA). Each pixel level was collected with laser scanner (Affymetrix, Santa Clara, USA) and levels 

30 of the expression of each cDNA and reliability (Present/Absent call) were calculated with 
Affymetrix GeneChip ver.3.3 and Affymetrix Microarray Suite ver.4.0 softwares. From this 
experiments, expression of 6000 genes in the human primary tumors of liver cancer patients are 
determined. 
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Exaraple 5. Kinetic RT-PCR analysis 

Expression of genes is also determined by kinetic RT-PCR. Kinetic RT-PCR was 

5 performed by a real-time fluorescence PCR system. PCR amplification using a LightCycler 
instrument (LightCycler system, Roche Diagnostics, Mannheim, Germany, Catalog No. 2011468) 
was carried out in 20 pi of reaction mixture consisting of a master mixture and buffer (LightCycler 
DNA Master hybridization probes, Roche Diagnostics, Mannheim, Germany, Catalog No. 
2158825), 4 mM magnesium chloride (Nacalai tescque, Tokyo, Japan, Catalog No. 7791-18-6), 10 

10 pmoles of PCR primers (Sawady Technology, Tokyo, Japan), 4 pmoles of fluorescent hybridization 
probes (Nihon Genome Research Laboratories, Sendai, Japan), which were designed to hybridize 
with the target sequences in a head-to-tail arrangement on the strand of amplified products, and 2 |xl 
of template cDNA in a LightCycler capillary (Roche Diagnostics, Mannheim, Germany, Catalog No. 
1909339). The donor probes was labeled at the 3 '-end with fluorescence, while the acceptor probe 

15 was labeled at the 5 '-end with LC-Red640 and modified at the 3 '- end by phosphorylation to block 
extension. The gap between the 3'-end of the donor probe and the 5'-end of the acceptor probe was 
between 1 and 3 bases. Prior to amplification, 0.16 ^ of TaqStart antibody (Clontech, Palo Alto, 
USA, Catalog No. 5400-1) was added to the reaction mixture, which was followed by the incubation 
at room temperature for 10 min to block primer elongation. Then, the antibody was inactivated by 

20 the incubation at 95°C for 90 sec., and the amplification was performed in the LightCycler by 40 
cycles of incubation at 95 °C for 0 sec. for denaturation, at 57-60 °C for 3-10 sec. for annealing and 
at 72 °C for 10 sec. for extension, with a temperature slope of 20 °C/sec. Real-time PCR monitoring 
was achieved by measuring the fluorescent signals at the end of the annealing phase in each 
amplification cycle. To qualify the integrity of isolated RNA and normalize the copy number of 

25 target sequences, kinetic RT-PCR analysis for glyceraldehyde-3-phosphate dehydrogenase 

(GAPDH) was also carried out by using hybridization probes. External standards for the target 
mRNA and GAPDH mRNA were prepared by 10-fold serial dilutions (10 3 to 10 8 ) of plasmid DNA 
Quantification of mRNA in each sample was performed automatically by reference to the standard 
curve constructed at each time point according to the LightCycler software (LightCycler software 

30 version 3, Roche Diagnostics, Mannheim, Germany). 
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Example 6. Identification of sets of genes of which expression distinguishes the liver cancer 
patients who have early intrahepatic recurrence from those the patients who do not have early 
intrahepatic recurrence 

Early intrahepatic recurrence tended to be associated with the number of primary tumor and 
5 TNM stage with the p values of 0.041 and 0.006, respectively, but not with the other 
clinicopathologic factors (Table 1). The number of primary tumors at the time of surgery 
distinguished group A from group B only with the limited sensitivity and specificity (62 % and 75 %, 
respectively). The TNM staging also had a limited sensitivity (67 %) and specificity (83 %) for the 
separation of groups A and B. Thus, it appears that these traditional classifications cannot be 
10 predictive of the early intrahepatic recurrence. 



Supervised learning in statistical pattern recognition was applied to analyze the data of high-density 
oligonucleotide microarrays. The scoring system was designed with the training samples and was 
validated its performance with the test samples (Fig. 1). In order to maintain independence of the 

15 training and test samples, the cross-validation approach in which the training and the test samples 
were interchanged was adopted. Thirty-three available samples were devided into 30 training 
samples and 3 test samples by the cross-validation approach (Fig. 1, Step 1). On the basis of a prior 
probability, ten sets of the training samples consisting of 11 samples from Group A and 19 samples 
from Group B were created As a result, ten sets of three test samples consisting of one from Group 

20 A and two from Group B were created. 



Fifty useful genes were selected to create the predictive scoring system from all the examined genes 
that had mean average differences of more than twofold between Group A and B using the Fisher 
criterion (Fig. 1, Steps 2-3), which was given by the following Formula 0, 

* 

Fffl _ 0^(0 -MO) 2 

P(A)o»(i) + P(B)a|(i) 

where p, A (i) is the i th component of the sample mean vector \i A of Group A, o\ (i) is the 
i th diagonal element of the sample co variance matrix 2 A of Group A, and P(A) is the a 

priori probability of Group A 



30 



Then, the optimal subset of the genes for the scoring system was identified as mentioned below, 
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The Fisher linear classifier assigns a test sample x to be classified to Group A in the following 
Formula (II). 

■ 

if F A (x) < F B (x) 
where 

5 F A (x)-i(x- l i A ) T 2" v J(x-|x A )-lnP(A) 
2 W = P(A)2 A + P(B)2 B 

In the leave-one-out method, the sample mean vector, sample covariance matrix, and the a 
priori probability were estimated by using 29 samples as training samples. Then, the resulting 
Fisher linear classifier was testd on the remaining sample as a pseudo-test sample. This operation 
was repeated 30 times. The error rate was calculated for each possible subset of the genes. For 
10 example, when selecting 5 genes out of 50, the number of subsets to be examined is two million. 

Next, candidate gene subsets minimizing the error rate were selected (Fig. 1, Step 4). This trial was 
independently repeated 10 times (Fig. 1, Step 5). 



Among the candidate gene subsets, the gene subset that most frequently appeared throughout the 10 
trials was selected as the optimal subset of the genes for the discrimination of the two groups (Fig. 1, 
15 Step 6). Using the optimal subset of genes selected, the score T is given by the following Formula 
(HI). 

T(x) = F A (x) - F B (x) 

In this scoring system, all HCCs with a negative T value are classified into Group A (early 
intrahepatic recurrence group) and all HCCs with a positve T value are classified into Group B 
20 (nonrecurrence group). 

The optimal number of the genes was determined according to the criterion J that was given by the 
following formula (TV) (Fig. 1, Step 7). 

* 

Ja ^ c S T(x) "xS T(x)] 

The criterion J measures the separability of Group A from B. The average and 95% confidence 
25 interval of the J values in 10 different training sets were computed for various numbers of the genes 
(Fig. 2). The separability became better in parallel to an increase in the number of the genes. 
Ninety-five percentage of the confidence interval became almost similar when the number of the 
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genes reached 10 and 12, indicating that the 12 is the most appropriate number of the genes for the 
separability of the two groups (Fig. 2). 



5 Example 7. The optimal subset of the 12 genes of which expression is indicative of early 
intrahepatic recurrence 

According to the algorithm described above, the optimal subset of the 12 genes that 
discriminates Group A from Group B was identified The optimal gene subset consisted of the genes 
for platelet-derived growth factor receptor alpha (PDGFRA), tumor necrosis factor alpha (TNF-a) 

10 inducible protein A20, lysosomal-associated multitransmembrane protein (LAPTm5), HLA-DR 
alpha heavy chain, rel proto-oncogene, StafSO, putative serine/threonine protein kinase, 
MAD S/MEF2-f amily transcription factor (MEF2C), HUMLUCA19 Human cosmid clone LUCA19 
from 3p2L3, DEAD-box protein p72, vimentin and KIAK0002 (Table 2). Of the 12 genes selected, 
expression of the eleven were down-regulated in Group A; the mean of the average differences of 

15 these genes in Group A were less than half of those in Group B (Fig. 3). In contrast, the 
HUMLUCA19 gene expression was up-regulated in Group A; the mean of the average differences 
of the HUMLUCA19 gene in Group A was increased by more than 3-fold compared to that in 
Group B (Fig. 3). Accuracy of the scoring for the prediction of the early intrahepatic was evaluated 
with the 10 different sets of 3 test samples (Fig. 4). Early recurrence of HCC is predicted by 

■ 

20 calculating the T values of the 12 genes from HCC patients. Recurrence within one year after 
surgery is very likely when the T value is below zero, and recurrence within one year after surgery is 
quite unlikely when the T value is above zero. The scoring system could perfectly predict early 
intrahepatic recurrence of 3 test samples in all 10 trials (Fig. 4). The scoring system was 
independent of viral infection patterns and was much more accurate than TNM staging system (Fig. 

25 4). Scoring system based on all 33 HCCs with the above 12 genes (Fig. 5) includes the following 
formula (V). 

Formula (V) 

T(x) = 0.053862 x 1 + 0.038848x 2 + 0.030176x 3 + 0.001824x4 + 0.096997x 5 + 0.017259x 6 + 
0.015908x 7 + 0.103081x8 - 0.093746x 9 + 0.024031x 10 - 0.005417x n - 0.119177x 12 - 11.046007, 
30 where x u x 2 , x 3 , x 4 , x 5 , x*, x 7 , x 8 , x 9 , x 10 , x u , x 12 are the normalized average differences of the 
mRNAs for platelet-derived growth factor receptor alpha (PDGFRA), tumor necrosis factor alpha 
(TNF-a) inducible protein A20, lysosomal-associated multitransmembrane protein (LAPTmS), 
HLA-DR alpha heavy chain, rel proto-oncogene, StafSO, putative serine/threonine protein kinase, 
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MADS/MEF2-famfly transcription factor (MEF2C), HUMLUCA19 Human cosraid clone LUCA19 
from 3p21.3, DEAD-box protein p72, vimentin and the KIAK0002 gene (Table 2). 

* 

The 12 genes selected by the present invention are involved in a wide range of biological processes. 
Of these, immune response-related genes such as HLA-DR alpha heavy chain, TNF-a inducible 
protein A20 and Staf50, were down-regulated in HCCs with early intrahepatic recurrence. Because 
HLA-DR alpha heavy chain is considered to play an important role in the antigen-presenting by 
macrophages (Tissot, C. & Mechti, N. Molecular cloning of a new interferon-induced factor that 
represses human immunodeficiency virus type 1 long terminal repeat expression, /. Biol. Chem. 270, 
14891-14898 (1995)), its down-regulation in tumorous tissues might facilitate escape of tumor cells 
from host immune surveillance. Rel proto-oncogene, which is involved in intracellular signaling 
pathway as well as NF-kB, was also down-regulated in HCCs with early intrahepatic recurrence. 
Furthermore, the expression of rel/NF-KB have been reported to be associated with T-cell activation 
(Mora, A et al NF-kappa B/Rel participation in the lymphokine-dependent proliferation of T 
lymphoid cells, J. Immunol 166, 2218-2227 (2000)). Thus, it seems that several genes that were 
selected for the use to predict early intrahepatic recurrence by the present invention are involved in 
the weakening the host immune responses against HCC cells possessing high metastatic potentials. 

Gene expression pattern of other HCC patients whose follow-up period recently reached 
20 one year was also analyzed by oligonucleotide microarray, and the scores of the expression 
of 12 genes were calculated according to the formula described above. T values of patients 

♦ 

who lived without recurrence more than one year after surgery were positive (plus score) 
and that of the other patient who had intrahepatic recurrence within one year after surgery 
was negative (minus). Thus, the scoring system consisting of the subset of 12 genes 
25 obtained from 6000 could predict early intrahepatic recurrence accurately. The application 
of supervised learning in statistical pattern recognition to clinical specimens may provide a 
key information in advances for prevention, diagnosis, and therapeutics of other diseases as 
well as HCC. Furthermore, not only DNA microarray but also other methods such as RT- 
PCR can be used to determine the expression of the optimal sets of genes. 



10 
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Table 1 



Clinicopathologic factors of the HCCs used to the early intrahepatic recurrence. 



Factors 



Group A (» «12) Group B (n =21) P "value 



Sex 
Male 
Female 



8 
4 



16 
5 



N.S. 



Age 
>60 



5 
7 



7 
14 



N.S. 



Viral infection 
HBV 
HCV 

Non B f Non C 



N.S. 



3 
8 
1 



4 
14 

3 



Primary lesion 
Single tumor 
Multipe tumors 



3 
9 



13 
8 



0.041 



Tumor size (cm) 
<2.0 
2.0-5.0 
>5.0 



N.S. 



0 
8 
4 



5 

14 

2 



Stage* 
MI 

IHA/IVA 



2 
10 



14 
7 



0.006 



Histological grading* 
Gl 
G2 
G3 



0 
9 
3 



2 

17 

2 



N.S. 



Venous invasion' 

<-) 
(+) 



7 
5 



18 
3 



N.S. 



Non-tumorous liver 
Non-specific change 
Chronic hepatitis 
liver cirrhosis 



N.S. 



1 
2 
9 



1 

10 
10 



5 *, Assessment based on TNM classification of UICC 

HBV: hepatitis B virus, HCV: hepatitis C virus, non-B non-C: neither HBV nor HCV 



Group A: early intrahepatic recurrence (+), Group B: early intrahepatic recurrence (-) 



N.S.: Not significant 
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Table 2 

The formula and the 12 genes to predict early intrahepatic recurrence. 



5 



10 





Formula 








0.053862^ +0.038848x 2 + 0.030176* 3 + 0.001824* 4 + 0.096997* 3 + 0.017259*, +0.015908*, 




+ O.10308b: B -0.093746*, + 0.024031*^ -0.005417^ -0.119177*0 -11.046007 




GB* 


Description 




M21574 


platelet-derived growth factor receptor alpha (PDGFRA) 




M59465 


tumor necrosis factor alpha inducible protein A20 




U51240 


lysosomal-associated multitransmembrane protein (LAPTmS) 


X*l 


X00274 


HLA-DR alpha heavy chain (class 11 antigen ) 


*5i 


X75042 


rel proto-oncogene 




X82200 


StafSO 


xii 


Y10032 


putative serine/threonine protein kinase 


X*l 


L08895 


MADS/MEF2-family transcription factor (MEF2C) 


X* 


AC000063 HUMLUCA19 Human cosmid clone LUCA19 from 3p21.3 


jcio; 


U59321 


DEAD-box protein p72 


x 


Z19554 


vimentin 


Xl2i 


D13639 


KIAK0002 gene 



GB*: Gene bank access number 
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Claims 



1 . A scoring system for the prediction of cancer recurrence using 2 or more genes and/or proteins 
selected by statistical analyses based on expression levels or patterns of genes and/or proteins of 
cancer tissues from human cancer patients who have recurrence and those who do not have 
recurrence. 

2. The scoring system according to claim 1, wherein the number of genes and/or proteins selected by 
statistical analyses is 4 or more. 

3. The scoring system according to claim 1, wherein the number of genes and/or proteins selected by 
statistical analyses is 6 or more. 

4. The scoring system according to claim 1, wherein the number of genes and/or proteins selected by 
statistical analyses is 12 or more. 

5. The scoring system according to claims 1-4, wherein the cancer tissues are liver cancer tissues. 

6. The scoring system according to claims 1-5, wherein the expression of genes and/or proteins of 
human cancer tissues and the expression of genes and/or proteins of human primary cancer 
tissues from patients who have recurrence and those who do not have recurrence are examined by 
means of DNA microarray, reverse transcription polymerase-chain reaction or protein array. 

7. A kit for carrying out the scoring system according to claims 1-6 comprising DNA chip, 
oligonucleotide chip, protein chip, probes or primers that are necessary for effecting DNA 
microarrays, oligonucleotide microarrays, protein arrays, northern blotting, RNase protection 
assays, western blotting, and reverse transcription polymerase-chain reaction to examine the 
expression of genes and/or proteins selected by the scoring system according to claims 1-6. 

8. A method for the prediction of cancer recurrence, comprising the steps of: 

(a) examining the expression levels or patterns of genes and/or proteins in the samples prepared 
from the cancer tissues of patients, wherein said genes and/or proteins are selected by the 
scoring system according to claims 1-6; and, 

(b) predicting cancer recurrence of patients by applying the expression levels or patterns of 
genes and/or proteins examined in step (a) to the scoring system according to claims 1-6. 
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Fig. 1 



Step 1. Divide 33 available samples into 30 training samples and 3 test samples by the 
cross-validation method. 

Step 2. Compute the Fisher criterion using 30 training samples. 

Step 3. Select superior 50 genes for discrimination between Groups A and B based on both 
the Fisher criterion and fold-change. 

Step 4. Select the candidate gene subsets out of 50 genes by the exhaustive search with the 
leave-one-out method. 

Step 5. Repeat Steps 1-4 independently (10 times). 

Step 6. Select the optimal gene subset which most frequently appears in 10 trials. 

Step 7. Determine the optimal number of the genes according to the criterion J computed 
with the 10 different sets of the training samples. 



Step 8. Design the scoring system using 30 training samples selected in Step 1. 

Step 9. Discriminate 3 test samples selected in Step 1 by the scoring system designed in 
Step 8. 

Step 10. Repeat Steps 8 and 9 independently 10 times with 10 different sets of the training 
and test samples. 
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15 
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4 



6 8 
Number of genes 



10 



12 
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Fig. 4 
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